NSF PAR Search | NSF Public Access Repository

Learning Representations for Hierarchies with Minimal Support

Rozonoyer, Benjamin; Boratko; Michael; Patel; Dhruvesh; Zhao, Wenlong; Dasgupta, Shib_Sankar; Le, Hung; McCallum, Andrew (September 2024, NeurIPS)

When training node embedding models to represent large directed graphs (digraphs), it is impossible to observe all entries of the adjacency matrix during training. As a consequence most methods employ sampling. For very large digraphs, however, this means many (most) entries may be unobserved during training. In general, observing every entry would be necessary to uniquely identify a graph, however if we know the graph has a certain property some entries can be omitted - for example, only half the entries would be required for a symmetric graph. In this work, we develop a novel framework to identify a subset of entries required to uniquely distinguish a graph among all transitively-closed DAGs. We give an explicit algorithm to compute the provably minimal set of entries, and demonstrate empirically that one can train node embedding models with greater efficiency and performance, provided the energy function has an appropriate inductive bias. We achieve robust performance on synthetic hierarchies and a larger real-world taxonomy, observing improved convergence rates in a resource-constrained setting while reducing the set of training examples by as much as 99%.

Full Text Available

Search for: All records